Universal Entropy of Word Ordering Across Linguistic Families

نویسندگان

  • Marcelo A. Montemurro
  • Damián H. Zanette
چکیده

BACKGROUND The language faculty is probably the most distinctive feature of our species, and endows us with a unique ability to exchange highly structured information. In written language, information is encoded by the concatenation of basic symbols under grammatical and semantic constraints. As is also the case in other natural information carriers, the resulting symbolic sequences show a delicate balance between order and disorder. That balance is determined by the interplay between the diversity of symbols and by their specific ordering in the sequences. Here we used entropy to quantify the contribution of different organizational levels to the overall statistical structure of language. METHODOLOGY/PRINCIPAL FINDINGS We computed a relative entropy measure to quantify the degree of ordering in word sequences from languages belonging to several linguistic families. While a direct estimation of the overall entropy of language yielded values that varied for the different families considered, the relative entropy quantifying word ordering presented an almost constant value for all those families. CONCLUSIONS/SIGNIFICANCE Our results indicate that despite the differences in the structure and vocabulary of the languages analyzed, the impact of word ordering in the structure of language is a statistical linguistic universal.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Complexity and universality in the long-range order of words

As is the case of many signals produced by complex systems, language presents a statistical structure that is balanced between order and disorder. Here we review and extend recent results from quantitative characterisations of the degree of order in linguistic sequences that give insights into two relevant aspects of language: the presence of statistical universals in word ordering, and the lin...

متن کامل

Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic Signals

Maximum Entropy (MaxEnt) language models are powerful models that can incorporate linguistic and non-linguistic contextual signals in a unified framework with a convex loss. MaxEnt models also have the advantage of scaling to large model and training data sizes We present the following two contributions to MaxEnt training: (1) By leveraging smaller amounts of transcribed data, we demonstrate th...

متن کامل

Learning biases predict a word order universal.

How recurrent typological patterns, or universals, emerge from the extensive diversity found across the world's languages constitutes a central question for linguistics and cognitive science. Recent challenges to a fundamental assumption of generative linguistics-that universal properties of the human language acquisition faculty constrain the types of grammatical systems which can occur-sugges...

متن کامل

Recursive Inconsistencies Are Hard to Learn: A Connectionist Perspective on Universal Word Order Correlations

Across the languages of the world there is a high degree of consistency with respect to the ordering of heads of phrases. Within the generative approach to language these correlational universals have been taken to support the idea of innate linguistic constraints on word order. In contrast, we suggest that the tendency towards word order consistency may emerge from non-linguistic constraints o...

متن کامل

The Secret's in the Word Order: Text-to-Text Generation for Linguistic Steganography

Linguistic steganography is a form of covert communication using natural language to conceal the existence of the hidden message, which is usually achieved by systematically making changes to a cover text. This paper proposes a linguistic steganography method using word ordering as the linguistic transformation. We show that the word ordering technique can be used in conjunction with existing t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2011